AITopics | orthogonality constraint

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

Neural Information Processing SystemsJun-17-2026, 06:36:54 GMT

Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA functional decomposition is a powerful explainability method, as it breaks down black-box models into a unique sum of lower-dimensional functions, provided that input variables are independent. In standard learning settings, input variables are often dependent, and the Hoeffding decomposition is generalized through hierarchical orthogonality constraints. Such generalization leads to unique and sparse decompositions with well-defined main effects and interactions. However, the practical estimation of this decomposition from a data sample is still an open problem. Therefore, we introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample. We show the convergence of TreeHFD, along with the main properties of orthogonality, sparsity, and causal variable selection.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
(2 more...)

Add feedback

A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models

Liang, Xiao, Li, Shuang

arXiv.org Machine LearningApr-7-2026

Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be computationally demanding and slow convergence. Motivated by the need for scalable estimation and classification from such data, we propose LSRTR-M, which incorporates Muon (MomentUm Orthogonalized by Newton-Schulz) updates into the LSRTR framework. Specifically, LSRTR-M preserves the original block coordinate scheme while replacing the projection-based factor updates with Muon steps. Across synthetic linear, logistic, and Poisson LSR-TGLMs, LSRTR-M converges faster in both iteration count and wall-clock time, while achieving lower normalized estimation and prediction errors. On the Vessel MNIST 3D task, it further improves computational efficiency while maintaining competitive classification performance.

artificial intelligence, machine learning, regression, (16 more...)

arXiv.org Machine Learning

2604.04726

Country:

North America > United States > Iowa (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation

Zhiqiang Xu

Neural Information Processing SystemsFeb-13-2026, 10:55:47 GMT

Shift-and-invert preconditioning, as a classic acceleration technique for the leading eigenvector computation, has received much attention again recently, owing to fast least-squares solvers for efficiently approximating matrix inversions in power iterations.

artificial intelligence, machine learning, power method, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Nevada (0.05)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

e33d974aae13e4d877477d51d8bafdc4-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 20:26:26 GMT

diagonal, edlae, regularization, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Los Gatos (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Orthogonal Soft Pruning for Efficient Class Unlearning

Gong, Qinghui, Yang, Xue, Tang, Xiaohu

arXiv.org Artificial IntelligenceNov-17-2025

Efficient and controllable data unlearning in federated learning remains challenging, due to the trade-off between forgetting and retention performance. Especially under non-independent and identically distributed (non-IID) settings, where deep feature entanglement exacerbates this dilemma. To address this challenge, we propose FedOrtho, a federated unlearning framework that combines orthogonalized deep convolutional kernels with an activation-driven controllable one-shot soft pruning (OSP) mechanism. FedOrtho enforces kernel orthogonality and local-global alignment to decouple feature representations and mitigate client drift. This structural independence enables precise one-shot pruning of forgetting-related kernels while preserving retained knowledge. FedOrtho achieves SOTA performance on CIFAR-10, CIFAR100 and TinyImageNet with ResNet and VGG frameworks, verifying that FedOrtho supports class-, client-, and sample-level unlearning with over 98% forgetting quality. It reduces computational and communication costs by 2-3 orders of magnitude in federated settings and achieves subsecond-level erasure in centralized scenarios while maintaining over 97% retention accuracy and mitigating membership inference risks.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.19891

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Law > Statutes (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings

Avants, Brian B., Tustison, Nicholas J., Stone, James R

arXiv.org Machine LearningNov-11-2025

Interpretable representation learning is a central challenge in modern machine learning, particularly in high-dimensional settings such as neuroimaging, genomics, and text analysis. Current methods often struggle to balance the competing demands of interpretability and model flexibility, limiting their effectiveness in extracting meaningful insights from complex data. We introduce Non-negative Stiefel Approximating Flow (NSA-Flow), a general-purpose matrix estimation framework that unifies ideas from sparse matrix factorization, orthogonalization, and constrained manifold learning. NSA-Flow enforces structured sparsity through a continuous balance between reconstruction fidelity and column-wise decorrelation, parameterized by a single tunable weight. The method operates as a smooth flow near the Stiefel manifold with proximal updates for non-negativity and adaptive gradient control, yielding representations that are simultaneously sparse, stable, and interpretable. Unlike classical regularization schemes, NSA-Flow provides an intuitive geometric mechanism for manipulating sparsity at the level of global structure while simplifying latent features. We demonstrate that the NSA-Flow objective can be optimized smoothly and integrates seamlessly with existing pipelines for dimensionality reduction while improving interpretability and generalization in both simulated and real biomedical data. Empirical validation on the Golub leukemia dataset and in Alzheimer's disease demonstrate that the NSA-Flow constraints can maintain or improve performance over related methods with little additional methodological effort. NSA-Flow offers a scalable, general-purpose tool for interpretable ML, applicable across data science domains.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2511.06425

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

Bénard, Clément

arXiv.org Machine LearningOct-30-2025

Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA functional decomposition is a powerful explainability method, as it breaks down black-box models into a unique sum of lower-dimensional functions, provided that input variables are independent. In standard learning settings, input variables are often dependent, and the Hoeffding decomposition is generalized through hierarchical orthogonality constraints. Such generalization leads to unique and sparse decompositions with well-defined main effects and interactions. However, the practical estimation of this decomposition from a data sample is still an open problem. Therefore, we introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample. We show the convergence of TreeHFD, along with the main properties of orthogonality, sparsity, and causal variable selection. The high performance of TreeHFD is demonstrated through experiments on both simulated and real data, using our treehfd Python package (https://github.com/ThalesGroup/treehfd). Besides, we empirically show that the widely used TreeSHAP method, based on Shapley values, is strongly connected to the Hoeffding decomposition.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2510.24815

Country:

North America > United States (0.28)
Europe (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
(3 more...)

Add feedback

DyME: Dynamic Multi-Concept Erasure in Diffusion Models with Bi-Level Orthogonal LoRA Adaptation

Liu, Jiaqi, Zhang, Lan, Yuan, Xiaoyong

arXiv.org Artificial IntelligenceSep-29-2025

Text-to-image diffusion models (DMs) inadvertently reproduce copyrighted styles and protected visual concepts, raising legal and ethical concerns. Concept erasure has emerged as a safeguard, aiming to selectively suppress such concepts through fine-tuning. However, existing methods do not scale to practical settings where providers must erase multiple and possibly conflicting concepts. The core bottleneck is their reliance on static erasure: a single checkpoint is fine-tuned to remove all target concepts, regardless of the actual erasure needs at inference. This rigid design mismatches real-world usage, where requests vary per generation, leading to degraded erasure success and reduced fidelity for non-target content. We propose DyME, an on-demand erasure framework that trains lightweight, concept-specific LoRA adapters and dynamically composes only those needed at inference. This modular design enables flexible multi-concept erasure, but naive composition causes interference among adapters, especially when many or semantically related concepts are suppressed. To overcome this, we introduce bi-level orthogonality constraints at both the feature and parameter levels, disentangling representation shifts and enforcing orthogonal adapter subspaces. We further develop ErasureBench-H, a new hierarchical benchmark with brand-series-character structure, enabling principled evaluation across semantic granularities and erasure set sizes. Experiments on ErasureBench-H and standard datasets (e.g., CIFAR-100, Imagenette) demonstrate that DyME consistently outperforms state-of-the-art baselines, achieving higher multi-concept erasure fidelity with minimal collateral degradation.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.21433

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Riemannian Optimization for LoRA on the Stiefel Manifold

Park, Juneyoung, Kang, Minjae, Lee, Seongbae, Lee, Haegang, Kim, Seongwan, Lee, Jaeho

arXiv.org Artificial IntelligenceAug-26-2025

While powerful, large language models (LLMs) present significant fine-tuning challenges due to their size. Parameter-efficient fine-tuning (PEFT) methods like LoRA provide solutions, yet suffer from critical optimizer inefficiencies; notably basis redundancy in LoRA's $B$ matrix when using AdamW, which fundamentally limits performance. We address this by optimizing the $B$ matrix on the Stiefel manifold, imposing explicit orthogonality constraints that achieve near-perfect orthogonality and full effective rank. This geometric approach dramatically enhances parameter efficiency and representational capacity. Our Stiefel optimizer consistently outperforms AdamW across benchmarks with both LoRA and DoRA, demonstrating that geometric constraints are the key to unlocking LoRA's full potential for effective LLM fine-tuning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.17901

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

On Prior Distributions for Orthogonal Function Sequences

Sugasawa, Shonosuke, Mochihashi, Daichi

arXiv.org Machine LearningAug-22-2025

We propose a novel class of prior distributions for sequences of orthogonal functions, which are frequently required in various statistical models such as functional principal component analysis (FPCA). Our approach constructs priors sequentially by imposing adaptive orthogonality constraints through a hierarchical formulation of conditionally normal distributions. The orthogonality is controlled via hyperparameters, allowing for flexible trade-offs between exactness and smoothness, which can be learned from the observed data. We illustrate the properties of the proposed prior and show that it leads to nearly orthogonal posterior estimates. The proposed prior is employed in Bayesian FPCA, providing more interpretable principal functions and efficient low-rank representations. Through simulation studies and analysis of human mobility data in Tokyo, we demonstrate the superior performance of our approach in inducing orthogonality and improving functional component estimation.

artificial intelligence, machine learning, principal function, (18 more...)

arXiv.org Machine Learning

2508.15552

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.25)
Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Modeling & Simulation (0.89)

Add feedback

Filters

Collaborating Authors

orthogonality constraint

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models

Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation

e33d974aae13e4d877477d51d8bafdc4-Paper.pdf

Orthogonal Soft Pruning for Efficient Class Unlearning

Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

DyME: Dynamic Multi-Concept Erasure in Diffusion Models with Bi-Level Orthogonal LoRA Adaptation

Riemannian Optimization for LoRA on the Stiefel Manifold

On Prior Distributions for Orthogonal Function Sequences